The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] convolutional neural networks(33hit)

21-33hit(33hit)

  • SDChannelNets: Extremely Small and Efficient Convolutional Neural Networks

    JianNan ZHANG  JiJun ZHOU  JianFeng WU  ShengYing YANG  

     
    LETTER-Biocybernetics, Neurocomputing

      Pubricized:
    2019/09/10
      Vol:
    E102-D No:12
      Page(s):
    2646-2650

    Convolutional neural networks (CNNS) have a strong ability to understand and judge images. However, the enormous parameters and computation of CNNS have limited its application in resource-limited devices. In this letter, we used the idea of parameter sharing and dense connection to compress the parameters in the convolution kernel channel direction, thus greatly reducing the number of model parameters. On this basis, we designed Shared and Dense Channel-wise Convolutional Networks (SDChannelNets), mainly composed of Depth-wise Separable SD-Channel-wise Convolution layer. The advantage of SDChannelNets is that the number of model parameters is greatly reduced without or with little loss of accuracy. We also introduced a hyperparameter that can effectively balance the number of parameters and the accuracy of a model. We evaluated the model proposed by us through two popular image recognition tasks (CIFAR-10 and CIFAR-100). The results showed that SDChannelNets had similar accuracy to other CNNs, but the number of parameters was greatly reduced.

  • Cross-Domain Deep Feature Combination for Bird Species Classification with Audio-Visual Data

    Naranchimeg BOLD  Chao ZHANG  Takuya AKASHI  

     
    PAPER-Multimedia Pattern Processing

      Pubricized:
    2019/06/27
      Vol:
    E102-D No:10
      Page(s):
    2033-2042

    In recent decade, many state-of-the-art algorithms on image classification as well as audio classification have achieved noticeable successes with the development of deep convolutional neural network (CNN). However, most of the works only exploit single type of training data. In this paper, we present a study on classifying bird species by exploiting the combination of both visual (images) and audio (sounds) data using CNN, which has been sparsely treated so far. Specifically, we propose CNN-based multimodal learning models in three types of fusion strategies (early, middle, late) to settle the issues of combining training data cross domains. The advantage of our proposed method lies on the fact that we can utilize CNN not only to extract features from image and audio data (spectrogram) but also to combine the features across modalities. In the experiment, we train and evaluate the network structure on a comprehensive CUB-200-2011 standard data set combing our originally collected audio data set with respect to the data species. We observe that a model which utilizes the combination of both data outperforms models trained with only an either type of data. We also show that transfer learning can significantly increase the classification performance.

  • TDCTFIC: A Novel Recommendation Framework Fusing Temporal Dynamics, CNN-Based Text Features and Item Correlation

    Meng Ting XIONG  Yong FENG  Ting WU  Jia Xing SHANG  Bao Hua QIANG  Ya Nan WANG  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2019/05/14
      Vol:
    E102-D No:8
      Page(s):
    1517-1525

    The traditional recommendation system (RS) can learn the potential personal preferences of users and potential attribute characteristics of items through the rating records between users and items to make recommendations.However, for the new items with no historical rating records,the traditional RS usually suffers from the typical cold start problem. Additional auxiliary information has usually been used in the item cold start recommendation,we further bring temporal dynamics,text and relevance in our models to release item cold start.Two new cold start recommendation models TmTx(Time,Text) and TmTI(Time,Text,Item correlation) proposed to solve the item cold start problem for different cold start scenarios.While well-known methods like TimeSVD++ and CoFactor partially take temporal dynamics,comments,and item correlations into consideration to solve the cold start problem but none of them combines these information together.Two models proposed in this paper fused features such as time,text,and relevance can effectively improve the performance under item cold start.We select the convolutional neural network (CNN) to extract features from item description text which provides the model the ability to deal with cold start items.Both proposed models can effectively improve the performance with item cold start.Experimental results on three real-world data set show that our proposed models lead to significant improvement compared with the baseline methods.

  • MF-CNN: Traffic Flow Prediction Using Convolutional Neural Network and Multi-Features Fusion

    Di YANG  Songjiang LI  Zhou PENG  Peng WANG  Junhui WANG  Huamin YANG  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2019/05/20
      Vol:
    E102-D No:8
      Page(s):
    1526-1536

    Accurate traffic flow prediction is the precondition for many applications in Intelligent Transportation Systems, such as traffic control and route guidance. Traditional data driven traffic flow prediction models tend to ignore traffic self-features (e.g., periodicities), and commonly suffer from the shifts brought by various complex factors (e.g., weather and holidays). These would reduce the precision and robustness of the prediction models. To tackle this problem, in this paper, we propose a CNN-based multi-feature predictive model (MF-CNN) that collectively predicts network-scale traffic flow with multiple spatiotemporal features and external factors (weather and holidays). Specifically, we classify traffic self-features into temporal continuity as short-term feature, daily periodicity and weekly periodicity as long-term features, then map them to three two-dimensional spaces, which each one is composed of time and space, represented by two-dimensional matrices. The high-level spatiotemporal features learned by CNNs from the matrices with different time lags are further fused with external factors by a logistic regression layer to derive the final prediction. Experimental results indicate that the MF-CNN model considering multi-features improves the predictive performance compared to five baseline models, and achieves the trade-off between accuracy and efficiency.

  • A ReRAM-Based Row-Column-Oriented Memory Architecture for Convolutional Neural Networks

    Yan CHEN  Jing ZHANG  Yuebing XU  Yingjie ZHANG  Renyuan ZHANG  Yasuhiko NAKASHIMA  

     
    BRIEF PAPER

      Vol:
    E102-C No:7
      Page(s):
    580-584

    An efficient resistive random access memory (ReRAM) structure is developed for accelerating convolutional neural network (CNN) powered by the in-memory computation. A novel ReRAM cell circuit is designed with two-directional (2-D) accessibility. The entire memory system is organized as a 2-D array, in which specific memory cells can be identically accessed by both of column- and row-locality. For the in-memory computations of CNNs, only relevant cells in an identical sub-array are accessed by 2-D read-out operations, which is hardly implemented by conventional ReRAM cells. In this manner, the redundant access (column or row) of the conventional ReRAM structures is prevented to eliminated the unnecessary data movement when CNNs are processed in-memory. From the simulation results, the energy and bandwidth efficiency of the proposed memory structure are 1.4x and 5x of a state-of-the-art ReRAM architecture, respectively.

  • Combining 3D Convolutional Neural Networks with Transfer Learning by Supervised Pre-Training for Facial Micro-Expression Recognition

    Ruicong ZHI  Hairui XU  Ming WAN  Tingting LI  

     
    PAPER-Pattern Recognition

      Pubricized:
    2019/01/29
      Vol:
    E102-D No:5
      Page(s):
    1054-1064

    Facial micro-expression is momentary and subtle facial reactions, and it is still challenging to automatically recognize facial micro-expression with high accuracy in practical applications. Extracting spatiotemporal features from facial image sequences is essential for facial micro-expression recognition. In this paper, we employed 3D Convolutional Neural Networks (3D-CNNs) for self-learning feature extraction to represent facial micro-expression effectively, since the 3D-CNNs could well extract the spatiotemporal features from facial image sequences. Moreover, transfer learning was utilized to deal with the problem of insufficient samples in the facial micro-expression database. We primarily pre-trained the 3D-CNNs on normal facial expression database Oulu-CASIA by supervised learning, then the pre-trained model was effectively transferred to the target domain, which was the facial micro-expression recognition task. The proposed method was evaluated on two available facial micro-expression datasets, i.e. CASME II and SMIC-HS. We obtained the overall accuracy of 97.6% on CASME II, and 97.4% on SMIC, which were 3.4% and 1.6% higher than the 3D-CNNs model without transfer learning, respectively. And the experimental results demonstrated that our method achieved superior performance compared to state-of-the-art methods.

  • Object Tracking by Unified Semantic Knowledge and Instance Features

    Suofei ZHANG  Bin KANG  Lin ZHOU  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2018/11/30
      Vol:
    E102-D No:3
      Page(s):
    680-683

    Instance features based deep learning methods prompt the performances of high speed object tracking systems by directly comparing target with its template during training and tracking. However, from the perspective of human vision system, prior knowledge of target also plays key role during the process of tracking. To integrate both semantic knowledge and instance features, we propose a convolutional network based object tracking framework to simultaneously output bounding boxes based on different prior knowledge as well as confidences of corresponding Assumptions. Experimental results show that our proposed approach retains both higher accuracy and efficiency than other leading methods on tracking tasks covering most daily objects.

  • A Two-Stage Crack Detection Method for Concrete Bridges Using Convolutional Neural Networks

    Yundong LI  Weigang ZHAO  Xueyan ZHANG  Qichen ZHOU  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/09/05
      Vol:
    E101-D No:12
      Page(s):
    3249-3252

    Crack detection is a vital task to maintain a bridge's health and safety condition. Traditional computer-vision based methods easily suffer from disturbance of noise and clutters for a real bridge inspection. To address this limitation, we propose a two-stage crack detection approach based on Convolutional Neural Networks (CNN) in this letter. A predictor of small receptive field is exploited in the first detection stage, while another predictor of large receptive field is used to refine the detection results in the second stage. Benefiting from data fusion of confidence maps produced by both predictors, our method can predict the probability belongs to cracked areas of each pixel accurately. Experimental results show that the proposed method is superior to an up-to-date method on real concrete surface images.

  • A Fully-Blind and Fast Image Quality Predictor with Convolutional Neural Networks

    Zhengxue CHENG  Masaru TAKEUCHI  Kenji KANAI  Jiro KATTO  

     
    PAPER-Image

      Vol:
    E101-A No:9
      Page(s):
    1557-1566

    Image quality assessment (IQA) is an inherent problem in the field of image processing. Recently, deep learning-based image quality assessment has attracted increased attention, owing to its high prediction accuracy. In this paper, we propose a fully-blind and fast image quality predictor (FFIQP) using convolutional neural networks including two strategies. First, we propose a distortion clustering strategy based on the distribution function of intermediate-layer results in the convolutional neural network (CNN) to make IQA fully blind. Second, by analyzing the relationship between image saliency information and CNN prediction error, we utilize a pre-saliency map to skip the non-salient patches for IQA acceleration. Experimental results verify that our method can achieve the high accuracy (0.978) with subjective quality scores, outperforming existing IQA methods. Moreover, the proposed method is highly computationally appealing, achieving flexible complexity performance by assigning different thresholds in the saliency map.

  • End-to-End Exposure Fusion Using Convolutional Neural Network

    Jinhua WANG  Weiqiang WANG  Guangmei XU  Hongzhe LIU  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2017/11/22
      Vol:
    E101-D No:2
      Page(s):
    560-563

    In this paper, we describe the direct learning of an end-to-end mapping between under-/over-exposed images and well-exposed images. The mapping is represented as a deep convolutional neural network (CNN) that takes multiple-exposure images as input and outputs a high-quality image. Our CNN has a lightweight structure, yet gives state-of-the-art fusion quality. Furthermore, we know that for a given pixel, the influence of the surrounding pixels gradually increases as the distance decreases. If the only pixels considered are those in the convolution kernel neighborhood, the final result will be affected. To overcome this problem, the size of the convolution kernel is often increased. However, this also increases the complexity of the network (too many parameters) and the training time. In this paper, we present a method in which a number of sub-images of the source image are obtained using the same CNN model, providing more neighborhood information for the convolution operation. Experimental results demonstrate that the proposed method achieves better performance in terms of both objective evaluation and visual quality.

  • Feature Adaptive Correlation Tracking

    Yulong XU  Yang LI  Jiabao WANG  Zhuang MIAO  Hang LI  Yafei ZHANG  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2016/11/28
      Vol:
    E100-D No:3
      Page(s):
    594-597

    Feature extractor plays an important role in visual tracking, but most state-of-the-art methods employ the same feature representation in all scenes. Taking into account the diverseness, a tracker should choose different features according to the videos. In this work, we propose a novel feature adaptive correlation tracker, which decomposes the tracking task into translation and scale estimation. According to the luminance of the target, our approach automatically selects either hierarchical convolutional features or histogram of oriented gradient features in translation for varied scenarios. Furthermore, we employ a discriminative correlation filter to handle scale variations. Extensive experiments are performed on a large-scale benchmark challenging dataset. And the results show that the proposed algorithm outperforms state-of-the-art trackers in accuracy and robustness.

  • Multi-Channel Convolutional Neural Networks for Image Super-Resolution

    Shinya OHTANI  Yu KATO  Nobutaka KUROKI  Tetsuya HIROSE  Masahiro NUMA  

     
    PAPER-IMAGE PROCESSING

      Vol:
    E100-A No:2
      Page(s):
    572-580

    This paper proposes image super-resolution techniques with multi-channel convolutional neural networks. In the proposed method, output pixels are classified into K×K groups depending on their coordinates. Those groups are generated from separate channels of a convolutional neural network (CNN). Finally, they are synthesized into a K×K magnified image. This architecture can enlarge images directly without bicubic interpolation. Experimental results of 2×2, 3×3, and 4×4 magnifications have shown that the average PSNR for the proposed method is about 0.2dB higher than that for the conventional SRCNN.

  • Food Image Recognition Using Covariance of Convolutional Layer Feature Maps

    Atsushi TATSUMA  Masaki AONO  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2016/02/23
      Vol:
    E99-D No:6
      Page(s):
    1711-1715

    Recent studies have obtained superior performance in image recognition tasks by using, as an image representation, the fully connected layer activations of Convolutional Neural Networks (CNN) trained with various kinds of images. However, the CNN representation is not very suitable for fine-grained image recognition tasks involving food image recognition. For improving performance of the CNN representation in food image recognition, we propose a novel image representation that is comprised of the covariances of convolutional layer feature maps. In the experiment on the ETHZ Food-101 dataset, our method achieved 58.65% averaged accuracy, which outperforms the previous methods such as the Bag-of-Visual-Words Histogram, the Improved Fisher Vector, and CNN-SVM.

21-33hit(33hit)